Search CORE

347 research outputs found

Panoramic Annular Localizer: Tackling the Variation Challenges of Outdoor Localization Using Panoramic Annular Images and Active Deep Descriptors

Author: Bai Jian
Cheng Ruiqi
Hu Weijian
Huang Xiao
Li Huabing
Lin Shufei
Sun Dongming
Wang Kaiwei
Yang Kailun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/07/2019
Field of study

Visual localization is an attractive problem that estimates the camera localization from database images based on the query image. It is a crucial task for various applications, such as autonomous vehicles, assistive navigation and augmented reality. The challenging issues of the task lie in various appearance variations between query and database images, including illumination variations, dynamic object variations and viewpoint variations. In order to tackle those challenges, Panoramic Annular Localizer into which panoramic annular lens and robust deep image descriptors are incorporated is proposed in this paper. The panoramic annular images captured by the single camera are processed and fed into the NetVLAD network to form the active deep descriptor, and sequential matching is utilized to generate the localization result. The experiments carried on the public datasets and in the field illustrate the validation of the proposed system.Comment: Accepted by ITSC 201

arXiv.org e-Print Archive

Crossref

Seeing through events: Real-time moving object sonification for visually impaired people using event-based camera

Author: Hu Weijian
Ji Zihao
Wang Kaiwei
Wang Ze
Yang Kailun
Publication venue: MDPI
Publication date: 07/06/2021
Field of study

Scene sonification is a powerful technique to help Visually Impaired People (VIP) understand their surroundings. Existing methods usually perform sonification on the entire images of the surrounding scene acquired by a standard camera or on the priori static obstacles acquired by image processing algorithms on the RGB image of the surrounding scene. However, if all the information in the scene are delivered to VIP simultaneously, it will cause information redundancy. In fact, biological vision is more sensitive to moving objects in the scene than static objects, which is also the original intention of the event-based camera. In this paper, we propose a real-time sonification framework to help VIP understand the moving objects in the scene. First, we capture the events in the scene using an event-based camera and cluster them into multiple moving objects without relying on any prior knowledge. Then, sonification based on MIDI is enabled on these objects synchronously. Finally, we conduct comprehensive experiments on the scene video with sonification audio attended by 20 VIP and 20 Sighted People (SP). The results show that our method allows both participants to clearly distinguish the number, size, motion speed, and motion trajectories of multiple objects. The results show that our method is more comfortable to hear than existing methods in terms of aesthetics

KITopen

Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning

Author: Huang Weijian
Li Cheng
Liu Jiarun
Wang Shanshan
Yang Hao
Publication venue
Publication date: 17/09/2023
Field of study

Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face a number of challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and capability of utilizing very limited or no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a novel multi-modal medical foundation model that explores masked contrastive learning to achieve granular alignment and zero-shot learning for a variety of medical imaging tasks. MaCo incorporates a correlation weighting mechanism to adjust the correlation between masked image patches and their corresponding reports, thereby enhancing the representation learning capabilities. We evaluate MaCo on six well-known open-source X-ray datasets, and the experimental results show it outperforms seven state-of-the-art approaches for classification, segmentation, and zero-shot phase grounding, demonstrating its great potential to promote a wide range of medical image analysis tasks

arXiv.org e-Print Archive